Skip to main content
Version: Next

Getting Started Guide

Notes

Obfusware experience: Beginner
Requirements:
        A Windows, MacOS, or Linux computer with internet browser
        Obfusware AG installed in AWS environment
Approximate time to complete: 20 minutes
Last updated: 16 Aug 2025

Now that Obfusware is installed to AWS, you are ready to create your first Obfusware AWS job and mask a dataset. AWS Glue studio allows you to visually construct a Glue job in just a few steps.

Glue Studio

  • Set the Job details
    Click on the Job details tab.

Job Details

  • Set the Basic property fields to the following values
FieldDefaultValue
NameUntitled jobFirst Obfusware job
Description<empty><empty>
IAM Role<empty>ObfuswareGlueRole
TypeSparkSpark
Glue versionGlue 5.0Glue 5.0
LanguagePython 3Python 3
Worker typeG 1XG 1X
Automatically scale the number of workersuncheckedunchecked
Requested number of workers102
Generate job insightscheckedchecked
Generate lineage eventsuncheckedunchecked
Job bookmarkDisableDisable
Join run queuinguncheckedunchecked
Flex executionuncheckedunchecked
Number of retries00
Job timeout (minutes4805
  • Save the job details
    After setting the appropriate job details, save the job by clicking the Save button in the top right corner of the page.

Save Job details

  • Build the job
    Select the Visual tab

Select Visual

  • Add a source node by clicking the + icon.

Add Source node

  • Select the Amazon S3 source.

Select Amazon S3 source

  • And set the Amazon S3 source parameters:
    • S3 Source Type: S3 location
    • S3 Url: s3://obfusware-381492123456-us-east-1/3.0/resources/sample-data.csv
    • Data format: CSV
    • Delimiter: Comma (,)

S3 Source details

  • Select the Obfusware Column Data Transform and set the transform parameters:
    • Masker 1: USLastNameMasker
    • Column 1: last_name
    • Masker 2: USVariableDateMasker
    • Column 2: dob
    • Masker 3: US555TelephoneMasker
    • Column 3: phone1
    • Masker 4: EmailMasker
    • Column 4: email

Obfusware Column Data Parameters

  • Select the Amazon S3 Target and set the Amazon S3 target parameters:
    • Format: CSV
    • Compression Type: None
    • S3 Target Location: s3://obfusware-381492123456-us-east-1/3.0/output/

S3 Target details

  • Save the job by clicking the Save button on the top right of the page.

Enabling Obfusware AWS Glue jobs

Now that your First Obfusware Job has been created, there is one more step you need to complete before you can run the job. Obfusware relies on a tight integration with AWS Glue. In order to achieve this integration, Obfusware requires its code, in the form of jar files to and python files to be accessible by AWS Glue.

While it is possible to manually enable an AWS Glue job by setting some Job details advanced properties, it is not simple and a little error prone, so Obfusware provides a management tool to enable a job for you.

Obfusware-manager CLI tool

$ obfusware-manager enable-job --help
usage: ObfuswareAWSGlueCLI enable-job [-h] jobs [jobs ...]
positional arguments:
jobs Names of existing AWS Glue jobs which will be enabled to execute Obfusware transforms
optional arguments:
-h, --help show this help message and exit

This tool is installed on the computer and user account used to initially install Obfusware. The tool is located in the bin install directory.

        On MacOS or Linux the bin install directory is located at:
            $HOME/.obfusware-aws/<version>/bin

        On Windows the bin install directory is located at:
            %USERPROFILE%\obfusware-aws\<version>\bin

To enable the First Obfusware Job, simply run the command:
<bininstalldir>/obfusware-manager -v enable-job “First Obfusware Job”

You should see the following output from the obfusware-manager command:

Enabling AWS Glue jobs to execute Obfusware transforms...
Enabling AWS Glue job(First Obfusware Job)
Enabling AWS Glue jobs SUCCEEDED
Success

The Obfusware First Job is now ready to run.

Running the Obfusware job

To run the job, select the Runs tab. Then click the Run button on the top right of the page.

Run tab

When the job finishes running, in approximately 1:30-1:45 minutes, the run status will change to Succeeded.

To see the result of the run, you can compare the original file (sample-data.csv) with the results of the run stored in the s3 output/ folder.

Comparing the results The location and name of the source file, sample-data.csv, is well known, but while the location of the target file is known, the exact name is generated by the job. Therefore, to compare the results you first need to list the contents of the output/ folder to discover the name. To generate a listing you can run the command:

$ aws s3 ls s3://obfusware-381492125655-us-east-1/3.0/output/
2025-08-01 10:24:46 0
2025-08-01 10:53:48 47600073 run-1754060013607-part-r-00000

Look for the file with a timestamp that matches the end of the job run. Once you have discovered the name of the result file you can compare the contents of the source file with the contents of the target file.

On MacOS or Linux:

$ aws s3 cp s3://obfusware-381492123456-us-east-1/3.0/resources/sample-data.csv - | head -2
first_name,last_name,dob,company_name,address,city,county,state,zip,phone1,phone2,email,web
James,Butt,4/1/1997,"Benton, John B Jr",6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com

$ aws s3 cp s3://obfusware-381492125655-us-east-1/3.0/output/run-1754060013607-part-r-00000 - | head -2`
first_name,last_name,dob,company_name,address,city,county,state,zip,phone1,phone2,email,web`
James,Ketchersid,5/1/1997,"Benton, John B Jr","6649 N Blue Gum St","New Orleans",Orleans,LA,70116,504-555-7562,504-845-1427,freddy6791@example.com,"http://www.bentonjohnbjr.com"

On Windows:

> aws s3 cp s3://obfusware-381492123456-us-east-1/3.0/resources/sample-data.csv - | more
first_name,last_name,dob,company_name,address,city,county,state,zip,phone1,phone2,email,web
James,Butt,4/1/1997,"Benton, John B Jr",6649 N Blue Gum St,New Orleans,Orleans,LA,70116,504-621-8927,504-845-1427,jbutt@gmail.com,http://www.bentonjohnbjr.com
...

> aws s3 cp s3://obfusware-381492125655-us-east-1/3.0/output/run-1754060013607-part-r-00000 - | more
first_name,last_name,dob,company_name,address,city,county,state,zip,phone1,phone2,email,web
James,Ketchersid,5/1/1997,"Benton, John B Jr","6649 N Blue Gum St","New Orleans",Orleans,LA,70116,504-555-7562,504-845-1427,freddy6791@example.com,http://www.bentonjohnbjr.com
...

By comparing the source fields (last_name, dob, phone1, email) to the corresponding target fields, you can see the results of masking the selected fields.